Goto

Collaborating Authors

 code base


AI coding is now everywhere. But not everyone is convinced.

MIT Technology Review

AI coding is now everywhere. But not everyone is convinced. Developers are navigating confusing gaps between expectation and reality. So are the rest of us. Depending who you ask, AI-powered coding is either giving software developers an unprecedented productivity boost or churning out masses of poorly designed code that saps their attention and sets software projects up for serious long term-maintenance problems. The problem is right now, it's not easy to know which is true. As tech giants pour billions into large language models (LLMs), coding has been touted as the technology's killer app. Both Microsoft CEO Satya Nadella and Google CEO Sundar Pichai have claimed that around a quarter of their companies' code is now AI-generated. And in March, Anthropic's CEO, Dario Amodei, predicted that within six months 90% of all code would be written by AI.


Cursor Launches an AI Coding Tool For Designers

WIRED

The 300-person startup hopes bringing designers aboard will give it an edge in an increasingly competitive AI software market. Cursor, the wildly popular AI coding startup, is launching a new feature that lets people design the look and feel of web applications with AI. The tool, Visual Editor, is essentially a vibe-coding product for designers, giving them access to the same fine-grained controls they'd expect from professional design software. But in addition to making changes manually, the tool lets them request edits from Cursor's AI agent using natural language. Cursor is best known for its AI coding platform, but with Visual Editor, the startup wants to capture other parts of the software creation process.


Learning to Code with Context: A Study-Based Approach

Borghoff, Uwe M., Minas, Mark, Schopp, Jannis

arXiv.org Artificial Intelligence

The rapid emergence of generative AI tools is transforming the way software is developed. Consequently, software engineering education must adapt to ensure that students not only learn traditional development methods but also understand how to meaningfully and responsibly use these new technologies. In particular, project-based courses offer an effective environment to explore and evaluate the integration of AI assistance into real-world development practices. This paper presents our approach and a user study conducted within a university programming project in which students collaboratively developed computer games. The study investigates how participants used generative AI tools throughout different phases of the software development process, identifies the types of tasks where such tools were most effective, and analyzes the challenges students encountered. Building on these insights, we further examine a repository-aware, locally deployed large language model (LLM) assistant designed to provide project-contextualized support. The system employs Retrieval-Augmented Generation (RAG) to ground responses in relevant documentation and source code, enabling qualitative analysis of model behavior, parameter sensitivity, and common failure modes. The findings deepen our understanding of context-aware AI support in educational software projects and inform future integration of AI-based assistance into software engineering curricula.


Empirical Assessment of the Perception of Software Product Line Engineering by an SME before Migrating its Code Base

Georges, Thomas, Huchard, Marianne, König, Mélanie, Nebut, Clémentine, Tibermacine, Chouki

arXiv.org Artificial Intelligence

Migrating a set of software variants into a software product line (SPL) is an expensive and potentially challenging endeavor. Indeed, SPL engineering can significantly impact a company's development process and often requires changes to established developer practices. The work presented in this paper stems from a collaboration with a Small and Medium-sized Enterprise (SME) that decided to migrate its existing code base into an SPL. In this study, we conducted an in-depth evaluation of the company's current development processes and practices, as well as the anticipated benefits and risks associated with the migration. Key stakeholders involved in software development participated in this evaluation to provide insight into their perceptions of the migration and their potential resistance to change. This paper describes the design of the interviews conducted with these stakeholders and presents an analysis of the results. Among the qualitative findings, we observed that all participants, regardless of their role in the development process, identified benefits of the migration relevant to their own activities. Furthermore, our results suggest that an effective risk mitigation strategy involves keeping stakeholders informed and engaged throughout the process, preserving as many good practices as possible, and actively involving them in the migration to ensure a smooth transition and minimize potential challenges.


Ruby Is Not a Serious Programming Language

WIRED

Ruby survives on affection, not utility. My little theory is that the concept of "imprinting" in psychology can just as easily be applied to programming: Much as a baby goose decides that the first moving life-form it encounters is its parent, embryonic programmers form ineradicable attachments to the patterns and quiddities of their first formative language. Because if/when the machines take over, we should at least speak their language. For many people, that language is Ruby. It's often credited with making programming "click"; imprintees speak of it with a certain indebtedness and affection.


Review for NeurIPS paper: Neural Bridge Sampling for Evaluating Safety-Critical Autonomous Systems

Neural Information Processing Systems

Summary and Contributions: Summary of contributions i) They set out to deploy probabilistic methods to determine the probability of dangerous events and determine the safety of a given, where dangerous events are simulated in a custom-built simulator, that combines exploration, exploitation, and optimization techniques to find failure modes and estimate the rate of occurrence. Summary They combine an adapted version of HMC, that they call warped HMC which, through sequential updates, utilizes normalizing flows and bridge sampling to extract samples corresponding to rare-events in a variety of different scenarios, generated via stochastic simulation. This paper shares some similar themes with NeuTra-lizing Bad Geometry in Hamiltonian Monte Carlo Using Neural Transport, but they also combine a series of other techniques. I had read this two-weeks ago and contributed to the discussions, so I apologise for the delay in the update. Just a few points and I believe the AC/ other reviewers have provided you with more feedback.


PromSec: Prompt Optimization for Secure Generation of Functional Source Code with Large Language Models (LLMs)

Nazzal, Mahmoud, Khalil, Issa, Khreishah, Abdallah, Phan, NhatHai

arXiv.org Artificial Intelligence

The capability of generating high-quality source code using large language models (LLMs) reduces software development time and costs. However, they often introduce security vulnerabilities due to training on insecure open-source data. This highlights the need for ensuring secure and functional code generation. This paper introduces PromSec, an algorithm for prom optimization for secure and functioning code generation using LLMs. In PromSec, we combine 1) code vulnerability clearing using a generative adversarial graph neural network, dubbed as gGAN, to fix and reduce security vulnerabilities in generated codes and 2) code generation using an LLM into an interactive loop, such that the outcome of the gGAN drives the LLM with enhanced prompts to generate secure codes while preserving their functionality. Introducing a new contrastive learning approach in gGAN, we formulate code-clearing and generation as a dual-objective optimization problem, enabling PromSec to notably reduce the number of LLM inferences. PromSec offers a cost-effective and practical solution for generating secure, functional code. Extensive experiments conducted on Python and Java code datasets confirm that PromSec effectively enhances code security while upholding its intended functionality. Our experiments show that while a state-of-the-art approach fails to address all code vulnerabilities, PromSec effectively resolves them. Moreover, PromSec achieves more than an order-of-magnitude reduction in operation time, number of LLM queries, and security analysis costs. Furthermore, prompts optimized with PromSec for a certain LLM are transferable to other LLMs across programming languages and generalizable to unseen vulnerabilities in training. This study is a step in enhancing the trustworthiness of LLMs for secure and functional code generation, supporting their integration into real-world software development.


Leveraging Large Language Models for Efficient Failure Analysis in Game Development

Marini, Leonardo, Gisslén, Linus, Sestini, Alessandro

arXiv.org Artificial Intelligence

In games, and more generally in the field of software development, early detection of bugs is vital to maintain a high quality of the final product. Automated tests are a powerful tool that can catch a problem earlier in development by executing periodically. As an example, when new code is submitted to the code base, a new automated test verifies these changes. However, identifying the specific change responsible for a test failure becomes harder when dealing with batches of changes -- especially in the case of a large-scale project such as a AAA game, where thousands of people contribute to a single code base. This paper proposes a new approach to automatically identify which change in the code caused a test to fail. The method leverages Large Language Models (LLMs) to associate error messages with the corresponding code changes causing the failure. We investigate the effectiveness of our approach with quantitative and qualitative evaluations. Our approach reaches an accuracy of 71% in our newly created dataset, which comprises issues reported by developers at EA over a period of one year. We further evaluated our model through a user study to assess the utility and usability of the tool from a developer perspective, resulting in a significant reduction in time -- up to 60% -- spent investigating issues.


RepairAgent: An Autonomous, LLM-Based Agent for Program Repair

Bouzenia, Islem, Devanbu, Premkumar, Pradel, Michael

arXiv.org Artificial Intelligence

Automated program repair has emerged as a powerful technique to mitigate the impact of software bugs on system reliability and user experience. This paper introduces RepairAgent, the first work to address the program repair challenge through an autonomous agent based on a large language model (LLM). Unlike existing deep learning-based approaches, which prompt a model with a fixed prompt or in a fixed feedback loop, our work treats the LLM as an agent capable of autonomously planning and executing actions to fix bugs by invoking suitable tools. RepairAgent freely interleaves gathering information about the bug, gathering repair ingredients, and validating fixes, while deciding which tools to invoke based on the gathered information and feedback from previous fix attempts. Key contributions that enable RepairAgent include a set of tools that are useful for program repair, a dynamically updated prompt format that allows the LLM to interact with these tools, and a finite state machine that guides the agent in invoking the tools. Our evaluation on the popular Defects4J dataset demonstrates RepairAgent's effectiveness in autonomously repairing 164 bugs, including 39 bugs not fixed by prior techniques. Interacting with the LLM imposes an average cost of 270,000 tokens per bug, which, under the current pricing of OpenAI's GPT-3.5 model, translates to 14 cents of USD per bug. To the best of our knowledge, this work is the first to present an autonomous, LLM-based agent for program repair, paving the way for future agent-based techniques in software engineering.


L2MAC: Large Language Model Automatic Computer for Unbounded Code Generation

Holt, Samuel, Luyten, Max Ruiz, van der Schaar, Mihaela

arXiv.org Artificial Intelligence

Transformer-based large language models (LLMs) are constrained by the fixed context window of the underlying transformer architecture, hindering their ability to produce long and logically consistent code. Memory-augmented LLMs are a promising solution, but current approaches cannot handle long code generation tasks since they (1) only focus on reading memory and reduce its evolution to the concatenation of new memories or (2) use very specialized memories that cannot adapt to other domains. This paper presents L2MAC, the first practical LLM-based stored-program automatic computer for long and consistent code generation. Its memory has two components: the instruction registry, which is populated with a prompt program to solve the user-given task, and a file store, which will contain the final and intermediate outputs. Each instruction is executed by a separate LLM instance, whose context is managed by a control unit capable of precise memory reading and writing to ensure effective interaction with the file store. These components enable L2MAC to generate virtually unbounded code structures, bypassing the constraints of the finite context window while producing code that fulfills complex user-specified requirements. We empirically show that L2MAC succeeds in generating large code bases for system design tasks where other coding methods fall short in implementing user requirements and provide insight into the reasons for this performance gap.